Nested Calls
One option to handle subroutines is to store the return address in a special register in the CPU, and when you need to return just copy that register into the instruction pointer, but then you can't handle nested calls, and most reasonably complex programs use multiple subroutines that could call each other; as soon as the second call happens, the return address for the first call would be overwritten.
To overcome this problem, the return addresses should be pushed to a stack at the top of the running program's memory, which is treated separately and is accessed directly through CPU registers
Machine Level Stacks
Stacks have two operations: Push, and Pop
The Intel x86 architecture uses main memory for the stack but accesses it via a register
ESP - stack pointer, always points to the memory address of the top item on the stack
Remember: the stack grows downwards in memory. As the program code is at the bottom of the memory and goes up, from position 0 to 1 to 2, etc, the stack of the program starts from the top of the program's allocated memory and works it's way down
The push instruction:
- Decrements ESP so it points to the next free area of memory on the stack
- Writes the data item to that address
The pop instruction: - Moves the data addressed by ESP into the given register
- Increments ESP by the correct amount to remove the item from the stack
- Note that the data stays in memory until overwritten, the stack pointer just moves to forget about it
The programmer must take care to tidy the stack and ensure items are removed when no longer needed (pop whatever has been pushed)
Call and Return
The call instruction:
- Takes the current value of EIP and pushes it onto the stack
- Puts the address of subroutine into EIP
The ret instruction: - Pops top item off the stack and places it into EIP
This allows for nested subroutines with correct, maintained return addresses
Manipulating the Stack Pointer
ESP can be changed directly from the code
eg. to take 8 bytes off the stack: add esp, 8
Note that you can inspect any data on the stack as an offset to ESP
Subroutine Parameters
Pass By Value
A simple subroutine like this uses pass by value (values copied into registers)
; SUB bigger
bigger: cmp eax, ebx
jl second
ret
second: mov eax, ebx
ret
; END bigger
...
mov eax, num1
mov ebx, num2
call bigger
mov max, eax
^ to execute max = maximum(num1, num2)
This depends on the caller and callee agreeing on which registers to use for the parameters and return value
Pass By Reference
A function that for example swaps two variables needs the memory locations, not just the values, so memory addresses are needed as parameters (pass by reference)
; SUB swap
swap: mov ecx, [eax]
mov edx, [ebx]
mov [ebx], ecx
mov [eax], edx
ret
; END swap
...
lea eax, num1
lea ebx, num2
call swap
The caller and the callee still need to agree on which registers to use
(Note)
Intel x86 has an instruction that can swap values inside two registersxchg eax, ebx
One operand can be a memory label but this is slow due to locking (a concurrency issue)
Stacking Parameters
If many parameters are needed, or registers are already in use for other data, you can stack the parameters:
- Caller pushes parameters before making the call
- Callee pops parameters and uses them
- Stack must be tidied up
- Both caller and callee need to agree on order of parameters and who tidies the stack
For example, for an rectangle area function:
Callee cleans the stack (stdcall)
; SUB area
area: pop ebx ; the return address
pop edx
pop eax
mul edx
push ebx ; the return address
ret
; END area
...
push width
push height
call area
mov result, eax
...
Caller cleans the stack (cdecl)
; SUB area
area: mov eax, [esp+4]
mult [esp+8]
ret
; END area
...
push width
push height
call area
add esp, 8
mov result, eax
...
Calling Conventions
Caller and Callee must agree on a calling convention
- Caller pushes parameters to stack in a given order
- If callee tidies, it must pop parameters as it uses them
- If caller tidies, callee must access parameters via ESP offsets
- Return value location must be pre-agreed (EAX in the above examples)
Immediately after the call instruction is executed:
- The return address will be the top thing on the stack
- If callee is cleaning stack, it must pop/save the address then push it back at the end
- If caller is cleaning stack, callee must include it in stack offset calculations
If the return address is forgotten about, the program will not work as intended
Intel x86 Calling Conventions
Intel x86 architecture defines four calling conventions
- cdecl - Push parameters on stack in reverse order (right to left); caller cleans stack
- fastcall - First two parameters in ECX/EDX, rest reversed on stack; callee cleans stack
- stdcall - Push parameters on stack in reverse order; callee cleans stack
- thiscall - First parameter in ECX, rest reversed on stack; callee cleans stack
fastcall and thiscall conventions are 'faster' if there are less parameters, but they pollute the registers that may be better used for something else
C library routines expect the programmer to use the cdecl convention.
I/O
I/O is hard in pure assembly, so instead:
- Specific registers are used to point to the address of the data in memory
- Trigger a CPU interrupt to pass control to the OS
- Device drivers perform I/O by liaising with hardware
External subroutines (C library code) can be called in the same way as assembly subroutines, but we must follow cdecl
printf - Send formatted output to the console
scanf - Wait for input from the console
Program Output
To output things, printf is used:
- It takes a string or literal as its first parameter
- We will pass the address of the string by reference
Following the cdecl convention: - Push the parameter to the stack
- Use pass by reference
- Clean up the stack afterwards
For example, to output a message:
#include <stdio.h>
#include <stdlib.h>
int main (void) {
char msg[] = "Hello World\n";
_asm {
lea eax, msg
push eax
call printf
pop eax
}
return 0;
}
Corrupted Registers
We don't know exactly what happens inside any external subroutines, but it will probably make use of registers which would overwrite them, so any register values that are important to the code must be saved before (use the stack)
Using the Stack
Save things onto the stack and then restore them after the external call returns
For example, for a program to output the string 10 times, the loop counter (ECX) must be maintained:
mov ecx, 10
floop: push ecx ;counter onto the stack
lea eax, msg
push eax
call printf
pop eax
pop ecx ;bring the counter back
loop floop
Doing this ensures that the code will work as intended, even if the external subroutine uses ECX, as we save its value and restore it after the external subroutine terminates, but before it is used by the loop instruction
Outputting Values
- The printf subroutine can take extra parameters that store values to be outputted
- Each parameter inserted into the string in place of a format specifier
- Inserted in the order that they appear in the parameter list
- Eg. to output someone's name and age:
- Param 1: "I am %s and I am %d years old\n"
- Param 2: "Bob"
- Param 3: 21
- Assuming parameters are passed in the correct order, this would output:
- "I am Bob and I am 21 years old"
Format Specifiers
- %d - Display as a decimal integer
- %s - Display as a string
- %c - Display as a single character
- %f - Display as a floating-point number
Parameters must match the specifiers in the string
- Types must match
- Number of parameters must match
- Parameters in correct order
If this is not done correctly, the assembly code will just crash
For example:
char msg[] = "The number is %d\n";
int num = 7;
_asm {
push num // Parameters pushed in reverse order (cdecl)
lea eax, msg
push eax
call printf
add esp, 8
}
Adding to ESP is a quick way to clean up multiple parameters at once
Program Input
To input data, scanf is used
It takes two parameters:
- Param 1: A format specifier to indicate the type of data
- Param 2: The memory address where the data should be stored
Following the cdecl convention:
- Push the parameters to the stack in reverse order
- Use pass by reference
- Clean up the stack after
Strings can be taken as input if care is taken to reserve enough memory
- Use the %s format specifier
- Declare a char array that is big enough to store what they enter (easy to overflow)
For example:
char fmt[] = "%d";
int num;
_asm {
lea eax, num // Remember the address is needed, not the value
push eax // Params pushed in reverse order
lea eax, fmt
push eax
call scanf
add esp, 8
}
Stacking Local Variables
In high level languages, subroutines can have local (internal) variables that only exist while the subroutine is active. This can be done in assembly using the stack
Stack Frames
Each time a subroutine is called, a new stack frame is created on the stack
This holds data that is needed by the subroutine:
- Parameters
- Return address
- Local variables
With nested calls, several stack frames will be present on the stack
Along with ESP, the CPU has another register EBP (stack base pointer) - This always points to the start of the current stack frame
- Can be used to access parameters and local variables using an offset
- eg. EBP-4 is the address of the second parameter that was pushed
Building the Stack Frame
ESP always points to the top of the stack
EBP initially points to the base of the stack
When a subroutine is called:
- Parameters are pushed to the stack first
- Then the return address is pushed
- Then value of EBP is pushed
- Local variables reserved on stack (causing ESP to change)
- Current value of ESP is put into EBP (to begin a new stack frame)
When a subroutine is ready to return: - Remove any local variables from the stack
- Pop top value into EBP (restore previous stack frame)
- Pop top value into EIP (move execution back to caller)
- Caller is responsible for cleaning any parameters still on the stack
Nested Calls and Stack Frames
- If a subroutine calls another nested subroutine
- The stack grows as a stack frame is built up
- Parameters
- Return address
- Old base pointer
- Local variables
- Values of EBP and ESP change as the calls happen
- ESP always points to the top of the stack
- EBP changes with each subroutine call
- Stack is cleaned up (gets smaller) as each subroutine returns